Test set

A test set is a set of data used in various areas of information science to assess the strength and utility of a predictive relationship. Test sets are used in artificial intelligence, machine learning, genetic programming, intelligent systems, and statistics. In all these fields, a test set has much the same role.

Contents

Rationale

Many procedures have been developed regression analysis was one of the earliest such approaches to be developed. The data used to construct or discover a predictive relationship are called the training data set. Most approaches that search through training data for empirical relationships tend to overfit the data, meaning that they can identify apparent relationships in the training data that do not hold in general. A test set is a set of data that is independent of the training data, but that follows the same probability distribution as the training data. If a model fit to the training set also fits the test set well, minimal overfitting has taken place. If the model fits the training set much better than it fits the test set, overfitting is likely the cause.

Example

See also

External links